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Abstract 

A new image denoising algorithm to deal with the additive Gaussian white noise 
model is given. Like the non-local means method, the filter is based on the weighted 
average of the observations in a neighborhood, with weights depending on the simi- 
larity of local patches. But in contrast to the non-local means filter, instead of using 
a fixed Gaussian kernel, we propose to choose the weights by minimizing a tight up- 
per bound of mean square error. This approach makes it possible to define the 
weights adapted to the function at hand, mimicking the weights of the oracle filter. 
Under some regularity conditions on the target image, we show that the obtained 
estimator converges at the usual optimal rate. The proposed algorithm is parameter 
free in the sense that it automatically calculates the bandwidth of the smoothing 
kernel; it is fast and its implementation is straightforward. The performance of the 
new filter is illustrated by numerical simulations. 

Keywords: Non-local means, image denoising, optimization weights, oracle, statistical 
estimation. 



1 Introduction 

We deal with the additive Gaussian noise model 

Y(x) = f(x)+e(x), xel, (1) 

where I is a uniform N x N grid of pixels on the unit square, Y = (Y (x)) xel is the 
observed image brightness, / : [0, l] 2 — > R + is an unknown target regression function 
and e = {£{x)) xe i are independent and identically distributed (i.i.d.) Gaussian random 
variables with mean and standard deviation o > 0. Important denoising techniques for 
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the model (1) have been developed in recent years, see for example Buades, Coll and Morel 
(2005 [1]), Kervrann (2006 [10]), Lou, Zhang, Osher and Bertozzi (2010 [14]), Polzehl 
and Spokoiny (2006 [17]), Garnett, Huegerich and Chui (2005 [8]), Cai, Chan, Nikolova 
(2008 [3]), Katkovnik, Foi, Egiazarian, and Astola ( 2010 [9]), Dabov, Foi, Katkovnik and 
Egiazarian (2006 [2]). A significant step in these developments was the introduction of 
the Non-Local Means filter by Buades, Coll and Morel [1] and its variants (see e.g. [10], 
[11], [14]). In these filters, the basic idea is to estimate the unknown image f(x ) by a 
weighted average of the form 

/^o) = 5>(:r)y(x), (2) 
xei 

where w = (w(x)) xel are some non-negative weights satisfying Yl x ei w ( x ) = 1- The 
choice of the weights w are based essentially on two criteria: a local criterion so that the 
weights are as a decreasing function of the distance to the estimated pixel, and a non-local 
criterion which gives more important weights to the pixels whose brightness is close to 
the brightness of the estimated pixel (see e.g. Yaroslavsky (1985 [25]) and Tomasi and 
Manduchi (1998 [23])). The non-local approach has been further completed by a fruitful 
idea which consists in attaching small regions, called data patches, to each pixel and 
comparing these data patches instead of the pixels themselves. 

The methods based on the non-local criterion consist of a comparatively novel direction 
which is less studied in the literature. In this paper we shall address two problems related 
to this criterion. 

The first problem is how to choose data depending on weights w in (2) in some opti- 
mal way. Generally, the weights w are defined through some priory fixed kernels, often 
the Gaussian one, and the important problem of the choice of the kernel has not been 
addressed so far for the non-local approach. Although the choice of the Gaussian kernel 
seems to show reasonable numerical performance, there is no particular reason to re- 
strict ourselves only to this type of kernel. Our theoretical results and the accompanying 
simulations show that another kernel should be preferred. In addition to this, for the 
obtained optimal kernel we shall also be interested in deriving a locally adaptive rule for 
the bandwidth choice. The second problem that we shall address is the convergence of the 
obtained filter to the true image. Insights can be found in [1], [10], [11] and [13], however 
the problem of convergence of the Non-Local Means Filter has not been completely settled 
so far. In this paper, we shall give some new elements of the proof of the convergence of 
the constructed filter, thereby giving a theoretical justification of the proposed approach 
from the asymptotic point of view. 

Our main idea is to produce a very tight upper bound of the mean square error 

r(im) =E(ux )-f(x )y 

in terms of the bias and variance and to minimize this upper bound in w under the 
constraints w > and ^2 xe iw(x) = 1. In contrast to the usual approach where a specific 
class of target functions is considered, here we give a bound of the bias depending only 
on the target function / at hand, instead of using just a bound expressed in terms of 
the parameters of the class. We first obtain an explicit formula for the optimal weights 
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w* in terms of the unknown function /. In order to get a computable filter, we estimate 
w* by some adaptive weights w based on data patches from the observed image Y. We 
thus obtain a new filter, which we call Optimal Weights Filter. To justify theoretically 
our filter, we prove that it achieves the optimal rate of convergence under some regularity 
conditions on /. Numerical results show that Optimal Weights Filter outperforms the 
typical Non-Local Means Filter, thus giving a practical justification that the optimal 
choice of the kernel improves the quality of the denoising, while all other conditions are 
the same. 

We would like to point out that related optimization problems for non parametric 
signal and density recovering have been proposed earlier in Sacks and Ylvysaker (1978 
[22]), Roll (2003 [19]), Roll and Ljung (2004 [20]), Roll, Nazin and Ljung (2005 [21]), 
Nazin, Roll, Ljung and Grama (2008 [15]). In these papers the weights are optimized 
over a given class of regular functions and thus depend only on some parameters of the 
class. This approach corresponds to the minimax setting, where the resulting minimax 
estimator has the best rate of convergence corresponding to the worst image in the given 
class of images. If the image happens to have better regularity than the worst one, the 
minimax estimator will exhibit a slower rate of convergence than expected. The novelty 
of our work is to find the optimal weights depending on the image / at hand, which 
implicates that our Optimal Weights Filter automatically attains the optimal rate of 
convergence for each particular image /. Results of this type are related to the "oracle" 
concept developed in Donoho and Johnstone (1994 [6]). 

Filters with data- dependent weights have been previously studied in many papers, 
among which we mention Polzehl and Spokoiny (2000 [18], 2003 [16], 2006 [17]), Kervrann 
(2006 [10] and 2007 [12]). Compared with these filters our algorithm is straightforward 
to implement and gives a quality of denoising which is close to that of the best recent 
methods (see Table 2). The weight optimization approach can also be applied with these 
algorithms to improve them. In particular, we can use it with recent versions of the Non- 
Local Means Filter, like the BM3D (see 2006 [2], 2007 [4, 5]); however this is beyond the 
scope of the present paper and will be done elsewhere. 

The paper is organized as follows. Our new filter based on the optimization of weights 
in the introduction in Section 2 where we present the main idea and the algorithm. Our 
main theoretical results are presented in Section 3 where we give the rate of convergence 
of the constructed estimators. In Section 4, we present our simulation results with a brief 
analysis. Proofs of the main results are deferred to Section 5. 

To conclude this section, let us set some important notations to be used throughout 
the paper. The Euclidean norm of a vector x = (x±, ...,Xd) € R d is denoted by ||x|| 2 = 

(^2t=i x fj ■ The supremum norm of x is denoted by ||x||oo — su Pi<i<(2 \ x i\ ■ The cardinality 
of a set A is denoted card A. For a positive integer N the uniform N x iV-grid of pixels 
on the unit square is defined by 



Each element x of the grid I will be called pixel. The number of pixels is n = N 2 . For 
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any pixel xq G I and a given h > 0, the square window of pixels 

U XQ>h = {x el:\\x- xolloo < h} (4) 

will be called search window at rr . We naturally take h as a multiple of jj (h — for some 
A; G {1,2, ••• , TV}). The size of the square search window U XOj h is the positive integer 
number 

M = nh 2 = card U xo ,h- 
For any pixel x G U xo ,h an d a given r] > a second square window of pixels 

V XjV = {y G I : 1 1 2/ - xHoo < 77} (5) 

will be called for short a patch window at re in order to be distinguished from the search 
window U X(h h- Like h, the parameter r\ is also taken as a multiple of j^. The size of the 
patch window V X;V is the positive integer 

m = nrf = card V X0)JJ . 

The vector = (Y (y^gv* v f° rme d by the the values of the observed noisy image 
Y at pixels in the patch V X)J , will be called simply data patch at x G U X0) /i. Finally, the 
positive part of a real number a is denoted by a + , that is 

a if a > 0, 
if a < 0. 

2 Construction of the estimator 

Let h > be fixed. For any pixel x Gl consider a family of weighted estimates fh, w ( x o) 
of the form 

fh,w( x o) = Yl w ( x ) Y ( x )i ( 6 ) 
xev XQih 

where the unknown weights satisfy 

w(x) > and w(x) = 1. (7) 

The usual bias plus variance decomposition of the mean square error gives 

E (/ hlW (a;o) - /Oro)) 2 = ^as 2 + Var, (8) 

with 

2 



.Bias 2 = Yj w ( x ) (f( x ) ~ f( x o)) an d V ar = °" 2 ^ w( 



a;) 2 . 



The decomposition (8) is commonly used to construct asymptotically minimax estimators 
over some given classes of functions in the nonparametric function estimation. In order 
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to highlight the difference between the approach proposed in the present paper and the 
previous work, suppose that / belongs to the class of functions satisfying the Holder 
condition \f(x) — f(y)\ < L\\x — Vx, y e I. In this case, it is easy to see that 

e(7^(xo)-/(x )) 2 < Yl w(x)L\x-x f\ +a 2 w ( x ) 2 - ( 9 ) 

Optimizing further the weights w in the obtained upper bound gives an asymptotically 
minimax estimate with weights depending on the unknown parameters L and j3 (for 
details see [22]). With our approach the bias term Bias 2 will be bounded in terms of the 
unknown function / itself. As a result we obtain some "oracle" weights w adapted to the 
unknown function / at hand, which will be estimated further using data patches from the 
image Y. 

First, we shall address the problem of determining the "oracle" weights. With this 
aim denote 

Pf, X0 (x) = \f(x)-f(x )\. (10) 

Note that the value pf jX0 (x) characterizes the variation of the image brightness of the 
pixel x with respect to the pixel xq. From the decomposition (8), we easily obtain a tight 
upper bound in terms of the vector pf >XQ : 

e(/,(x )-/(x )) 2 <^ o H, (ii) 

where 

From the following theorem we can obtain the form of the weights w which minimize 
the function g pfxo (w) under the constraints (7) in terms of the values pf >xo (x) . For the 
sake of generality, we shall formulate the result for an arbitrary non-negative function 
p(x) , x G U xo ,h- Define the objective function 

Introduce into consideration the strictly increasing function 

M P (t)= J2 P(x)(t- P (x)) + , t>0. (14) 

Let K tr be the usual triangular kernel: 

K tv (t) = (l-\t\) + , teR 1 . (15) 
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Theorem 1 Assume that p(x) , x G U Xo ,h, is a non-negative function. Then the unique 
weights which minimize g p (w) subject to (7) are given by 

K tr ( p ^ ) 

M x ) = ^ tr K a t ^y x G Ux °> h ' (16) 

where the bandwidth a > is the unique solution on (0, oo) of the equation 

M p (a) = a 2 . (17) 

Theorem 1 can be obtained from a result of Sacks and Ylvysaker [22]. The proof is 
deferred to Section 5.1. 

Remark 2 The value of a > can be calculated as follows. We sort the set {p(x) \ x G 
^Jx ,h} in the ascending order = p\ < p 2 < • • • < pu < Pm+i = +oo ; where M = 
Card\J X0:h . Let 

a k = l<k<M, (18) 

i=i 

and 

k* = max{l <k<M\a k >p k } = min{l < k < M \ a k < p k } — 1, (19) 

with the convention that a k = oo if p k = and that min = M + l. Then the solution a > 
of (17) can be expressed as a = a k *; moreover, k* is the unique integer k G {1, • • • , M} 
such that a k > p k and a k+1 < p k+1 if k < M. 

The proof of the remark is deferred to Section 5.2. 

Let Xq G I. Using the optimal weights given by Theorem 1, we first introduce the 
following non computable approximation of the true image, called "oracle": 



where the bandwidth a is the solution of the equation M Pf x (a) = a 2 . A computable filter 
can be obtained by estimating the unknown function pf tXQ (x) and the bandwidth a from 
the data as follows. 

Let h > and i] > be fixed numbers. For any x G I and any x G U Xo ,h consider a 
distance between the data patches Y Xj1) = (Y (y)) yeW and Y xo>r? = (Y (y)) J/GV defined 
by ' ' "" 

d 2 (Y Y ) = — \\Y — Y II 2 

u V x x a;o>^/ II x x,V x a:o,'?ll2' 

TXX 
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where m — card V^, and ||Y X) ,j — Y^^l^ — £ (Y(x + z)-Y(x + z)). Since 

Buades, Coll and Morel [1] the distance d 2 (Y Xj7? , Y XQ!V ) is known to be a flexible tool to 
measure the variations of the brightness of the image Y. As 

Y(x + z) - Y(x + z) — f(x + z) - f(x + z) + e(x + z) - e(x + z) 

we have 

E(Y(x + z)- Y(x + z)f = (f{x + z)- f(x + z)f + 2a 2 . 
If we use the approximation 

(/(* + z)- f(x + z)f « (f(x) - f(x )f = pl Xo (x) 

and the law of large numbers, it seems reasonable that 

pl X0 (x)^d 2 (Y x ^Y XlhV )-2a 2 . 

But our simulations show that a much better approximation is 

Pf,x (x) ~ Px (x) = (d(Y Xir) ,Y XOir) ) - y/2a) . (21) 

The fact that p xo (x) is a good estimator of pf >xo will be justified by convergence theorems: 
cf. Theorems 4 and 5 of Section 3. Thus our Optimal Weights Filter is defined by 

f(x ) = / M (s ) = ^ J{n( p InW) . ( 22 ) 

where the bandwidth a > is the solution of the equation M-p (a) = a 2 , which can be 
calculated as in Remark 2 (with p(x) and a replaced by f) Xo { x ) an d a respectively). We 
end this section by giving an algorithm for computing the filter (22). The input values of 
the algorithm are the image Y (x) , x e I , the variance of the noise a and two numbers 
m and M representing the sizes of the patch window and the search window respectively. 

Algorithm : Optimal Weights Filter 
Repeat for each x e I 

give an initial value of a: a = 1 (it can be an arbitrary positive number). 

compute {p xo (x) | x G U xo , h } by (21) 
/ compute the bandwidth a at xq 

reorder {p Xo (x) \ x € U Xo ,h} as increasing sequence, say 

PxoM < PxoM < ■ ■ < Pxq(xm) 

loop from k — 1 to M 

if £?=1 PxoM > 

if ^ 2 +Etipg (^i) > then - = ^ 2 +Etipg (^) 

Ei=i pa=o( a; «) Ei=i p^o ( Xi ) 

else quit loop 

else continue loop 
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end loop 

/ compute the estimated weights w at x 




I compute the filter f at x 



compute f(x ) = Y JXl& u xoM ™( x i) Y ( x i)- 



The proposed algorithm is computationally fast and its implementation is straightfor- 
ward compared to more sophisticated algorithms developed in recent years. Notice that 
an important issue in the non-local means filter is the choice of the bandwidth parameter 
in the Gaussian kernel; our algorithm is parameter free in the sense that it automatically 
chooses the bandwidth. 

The numerical simulations show that our filter outperforms the classical non-local 
means filter under the same conditions. The overall performance of the proposed filter 
compared to its simplicity is very good which can be a big advantage in some practical 
applications. We hope that optimal weights that we deduced can be useful with more 
complicated algorithms and can give similar improvements of the denoising quality. How- 
ever, these investigations are beyond the scope of the present paper. A detailed analysis 
of the performance of our filter is given in Section 4. 



In this section, we present two theoretical results. 

The first result is a mathematical justification of the "oracle" filter introduced in the 
previous section. It shows that despite the fact that we minimized an upper bound of 
the mean square error instead of the mean square error itself, the obtained "oracle" still 
has the optimal rate of convergence. Moreover, we show that the weights optimization 
approach possesses the following important adaptivity property: our procedure automat- 
ically chooses the correct bandwidth a > even if the radius h > of the search window 
U XQt h is larger than necessary. 

The second result shows the convergence of the Optimal Weights Filter f h:V under some 
more restricted conditions than those formulated in Section 2. To prove the convergence, 
we split the image into two independent parts. From the first one, we construct the 
"oracle" filter; from the second one, we estimate the weights. Under some regularity 
assumptions on the target image we are able to show that the resulting filter has nearly 
the optimal rate of convergence. 

Let p (x) , x G U XOj /i, be an arbitrary non-negative function and let w p be the optimal 
weights given by (16). Using these weights w p we define the family of estimates 



depending on the unknown function p. The next theorem shows that one can pick up a 
useful estimate from the family if the the function p is close to the "true" function 



3 Main results 




(23) 
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Pf,x (x) = 1/0*0 -f(x )\, i.e. if 

p(x) = \f(x)-f(xo)\+S n , (24) 

where <5 n > is a small deterministic error. We shall prove the convergence of the estimate 
fl under the local Holder condition 

|/(z) - f(y)\ < L\\x - y\C Vx, y E U xo , h , (25) 

where j3 > is a constant, ft, > 0, and x e I. 

In the following, q > (i > 1) denotes a positive constant, and 0(a n ) (n > 1) denotes 
a number bounded by c • a n for some constant c > 0. All the constants q > and c > 
depend only on L, (3 and <r; their values can be different from line to line. 

i 

Theorem 3 Assume that h = C\n~ " 2 P+ 2 with C\ > Cq = ^ - ^g^,2^ +2 ^ 2,9+2 ; or h > C\rT a 
with < a < an< ^ Cl > Suppose that f satisfies the local Holder's condition (25) 
and that 5 n = O |n"W j . Then 

E (A*(x ) - /K)) 2 = O fn-^?) . (26) 



The proof will be given in Section 5.3. 

i 

Recall that the bandwidth h of order n 2 + 2 ' 3 is required to have the optimal minimax 

rate of convergence O (n~ 2+2 P j of the mean squared error for estimating the function / of 

global Holder smoothness j3 (cf. e.g. [7]). To better understand the adaptivity property 
of the oracle f£(xo), assume that the image / at x has Holder smoothness (3 (see [24]) 

and that h > con" a with < a < t^^, which means that the radius h > of the search 

i 

window U X0:h has been chosen larger than the "standard" n 2 < 3 + 2 . Then, by Theorem 3, 

the rate of convergence of the oracle is still of order n 2 + 2 P , contrary to the global case 
mentioned above. If we choose a sufficiently large search window U XOt h, then the oracle 
fh( x o) w ill have a rate of convergence which depends only on the unknown maximal local 
smoothness /3 of the image /. In particular, if (3 is very large, then the rate will be close 
to n" 1 / 2 , which ensures good estimation of the flat regions in cases where the regions are 
indeed flat. More generally, since Theorem 3 is valid for arbitrary j3, it applies for the 
maximal local Holder smoothness (3 Xo at xq, therefore the oracle fh( x o) will exhibit the 



best rate of convergence of order n 2+2fe o at x . In other words, the procedure adapts to 
the best rate of convergence at each point x of the image. 

We justify by simulation results that the difference between the oracle computed 
with p = pf tXQ = \ f (x) — / (xq) \ , and the true image /, is extremely small (see Table 1). 
This shows that, at least from the practical point of view, it is justified to optimize the 
upper bound g Pfxo (w) instead of optimizing the mean square error E(/£(xo) — f{ x o)) 2 
itself. 

The estimate f£ with the choice p (x) = pf xo (x) will be called oracle filter. In partic- 
ular for the oracle filter /£, under the conditions of Theorem 3, we have 



o 2/3 

E (fh(x ) - f(x )f < g p (w p ) < cn—?. 
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Now, we turn to the study of the convergence of the Optimal Weights Filter. Due to 
the difficulty in dealing with the dependence of the weights we shall consider a slightly 
modified version of the proposed algorithm: we divide the set of pixels into two indepen- 
dent parts, so that the weights are constructed from the one part, and the estimation of 
the target function is a weighted mean along the other part. More precisely, assume that 
x e I, h > and rj > 0. To prove the convergence we split the set of pixels into two parts 

I = I.o Ul " , Where 

I' X0 = jx + (jj, j^j e I : i +j is even J (27) 

is the set of pixels with an even sum of coordinates % + j and I XQ = T\V XQ . Denote 
U x h = h H I^. o and V xr) = V xr) nl" . Consider the distance between the data patches 
Yi',, = (Y (y)) yev ,, v and = (Y (y))^",., defined h J 

d(Y" Y" ) = 1 llY" -Y" II 

" v x,i}i ^xow) y^rnf' x ' r ' x o,v\\2 , 

where m" = card V"^. An estimate of the function pf >Xo is given by 

P/,,o (x) « K (x) = (d (Y» „, YIJ -V2a) + , (28) 
see (21). Define the filter f' h by 

E ^WO*), (29) 

where 



w" = argmin ^ w(x)j^ (x) + <r 2 ^ w 2 (x). (30) 

The next theorem gives a rate of convergence of the Optimal Weights Filter if the 
parameters h > and rj > are chosen properly according to the local smoothness f3. 

Theorem 4 Assume that h = Cin~ 2 ' 3 + 2 wift ci > c = ^g^l^" 1 " 2 ^ 2/?+2 ; 

?7 = C2n _2 ' 9 + 2 . Suppose that function f satisfies the local Holder condition (25). Then 

E {T h ,r,M - f(x )) 2 = O (n~^ Inn) . (31) 
For the proof of this theorem see Section 5.4. 

Theorem 4 states that with the proper choices of the parameters h and rj, the mean 
square error of the estimator fh jV (x ) converges nearly at the rate 0(n 2 "+ 2 ) which is the 
usual optimal rate of convergence for a given Holder smoothness f3 > (cf. e.g. [7]). 

Simulation results show that the adaptive bandwidth a provided by our algorithm 
depends essentially on the local properties of the image and does not depend much on 
the radius h of the search window. These simulations, together with Theorem 3, suggest 
that the Optimal Weights Filter (22) can also be applied with larger h, as is the case of 
the "oracle" filter /£. The following theorem deals with the case where h is large. 
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Theorem 5 Assume that h = C\n a with c\ > 0, and < a < jpyl an< ^ ^ ria ^ V = 
c 2 n~w. Suppose that the function f satisfies the local Holder condition (25). Then 

E (? h ,r,(x ) - f(x )) 2 = O (n-*&* Inn) . 

For the proof of this theorem see Section 5.5. Note that in this case the obtained rate 
of convergence is not the usual optimal one, in contrast to Theorems 3 and 4, but we 
believe that this is the best rate that can be obtained for the proposed filter. 



4 Numerical performance of the Optimal Weights 
Filter 

The performance of the Optimal Weights Filter fh, v (x ) is measured by the usual Peak 
Signal-to-Noise Ratio (PSNR) in decibels (db) defined as 

PSNR = 10 log 10 — , MSB = - f Kv {x))\ 

xei 

where / is the original image, and / the estimated one. 

In the simulations, we sometimes shall use the smoothed version of the estimate of 
brightness variation dx (Y Xj7J , Y X0)J? ) instead of the non smoothed one d(Y XiV ,Y XOtV ) . It 
should be noted that for the smoothed versions of the estimated brightness variation 
we can establish similar convergence results. The smoothed estimate dx (Y XjJ? , Y XOtV ) is 
defined by 

. rv v ^_ ll^(y)-(Yx,,-Y^)|| 2 

u>K \ 1 x,rji 1 xo,v) I ' 

where K are some weights defined on V XOi?) . The corresponding estimate of brightness 
variation pf jXQ (x) is given by 

Pk,x (x) = (d K (Y Xtr] ,Y XOtr] ) - y/2<rj . (32) 
With the rectangular kernel 

= < 33 » 

we obtain exactly the distance d (Y XjJ? , Y XOj7? ) and the filter described in Section 2. Other 
smoothing kernels K used in the simulations are the Gaussian kernel 

* 9M _ P (J^E), (34) 

where h g is the bandwidth parameter and the following kernel 

K (y) = \ p^'^T"- ^ lit T (35) 

z^k=i (2fe+i) 2 11 y ~ x o, 

n 



Images 
Sizes 


Lena 
512 X 512 


Barbara 
512 X 512 


Boat 
512 x 512 


House 
256 x 256 


Peppers 
256 X 256 


lit i "' .\ 1 / > 

cr/PbNR 


10/28. 12db 


10/28. 12db 


10/28. 12db 


10/28. lldb 


10/28. lldb 


11 V 1 1 

11 X 11 

13 x 13 
15 x 15 
17 x 17 


A 1 Ofl^K 

41.92db 
42.54db 
43.07db 


4U.UOQD 

40.82db 
41.48db 
42.05db 


40.99db 
41.62db 
42.79db 


A1 crn^lK 
41.0UQD 

42.24db 
42.85db 
43.38db 


Af\ Qfi^K 
4U.OUQD 

41.01db 
41.53db 
41.99db 


a/PSN R 


20/22. lldb 


20/22. lldb 


20/22. lldb 


20/28. 12db 


20/28. 12db 


11 X 11 
13 x 13 
15 x 15 
17 x 17 


37.17db 
37.91db 
38.57db 
39.15db 


35.92db 
36.70db 
37.37db 
37.95db 


36.23db 
37.01db 
37.65db 
38.22db 


37.18db 
37.97db 
38.59db 
39. lldb 


36.25db 
36.85db 
37.38db 
37.80db 


cr/PSNR 


30/18.60db 


30/18. 60db 


30/18.60db 


30/18. 61db 


30/18. 61db 


11 x 11 
13 x 13 
15 x 15 
17 x 17 


34.81db 
35.57db 
36.24db 
36.79db 


33.65db 
34.47db 
35.15db 
35.75db 


33.79db 
34.58db 
35.25db 
35.84db 


34.93db 
35.78db 
36.48db 
37.07db 


33.57db 
34.23db 
34.78db 
35.26db 



Table 1: PSNR values when oracle estimator ft is applied with different values of M. 
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Figure 1: The shape of the kernels K g (left) and Kq (right) with M = 21 x 21. 

with the width of the similarity window m = (2p + l) 2 . The shape of these two kernels 
are displayed in Figure 1. 

To avoid the undesirable border effects in our simulations, we mirror the image outside 
the image limits, that is we extend the image outside the image limits symmetrically with 
respect to the border. At the corners, the image is extended symmetrically with respect 
to the corner pixels. 

We have done simulations on a commonly-used set of images available at http://decsai. 
ugr.es/javier/denoise/test images/ which includes Lena, Barbara, Boat, House, Peppers. 
The potential of the estimation method is illustrated with the 512 x 512 image "Lena" 
(Figure 2(a)) and "Barbara" (Figure 3(a) ) corrupted by an additive white Gaussian noise 
(Figures 2(b), PSNR= 22.10rf6, a = 20 and 3 (b), PSNR= 18.60, a = 30 ). We first used 
the rectangular kernel Kq for computing the estimated brightness variation function f)R,x , 
which corresponds to the Optimal Weights Filter as defined in Section 2. Empirically we 
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found that the parameters m and M can be fixed to m = 21 x 21 and M = 13 x 13. 
In Figures 2(c) and 3(c), we can see that the noise is reduced in a natural manner 
and significant geometric features, fine textures, and original contrasts are visually well 
recovered with no undesirable artifacts (PSNR= 32.52^6 for "Lena" and PSNR = 28.89 
for "Barbara"). To better appreciate the accuracy of the restoration process, the square 
of the difference between the original image and the recovered image is shown in Figures 
2(d) and 3(d), where the dark values correspond to a high-confidence estimate. As 
expected, pixels with a low level of confidence are located in the neighborhood of image 
discontinuities. For comparison, we show the image denoised by Non-Local Means Filter 
in Figures 2(e),(f) and 3 (e), (f). The overall visual impression and the numerical results 
are improved using our algorithm. 

The Optimal Weights Filter seems to provide a feasible and rational method to detect 
automatically the details of images and take the proper weights for every possible geomet- 
ric configuration of the image. For illustration purposes, we have chosen a series of search 
windows XJ X0 ,h with centers at some testing pixels Xo on the noisy image, see Figure 4 
The distribution of the weights inside the search window U XOt h depends on the estimated 
brightness variation function Pk,x (x) , x & U Xo ,h- If the estimated brightness variation 
Pk,x ( x ) is l ess than a (see Theorem 1), the similarity between pixels is measured by a 
linear decreasing function of Pk,x (x) ; otherwise it is zero. Thus a acts as an automatic 
threshold. In Figure 5, it is shown how the Optimal Weights Filter chooses in each case 
a proper weight configuration. 

The best numerical results are obtained using K = K g and K = K Q in the definition of 
Pk, xo - 111 Table 2, we compare the Non-Local Mean Filter and the Optimal Weights filter 
with different choices of the kernel: K = K g , K , K r . The best PSNR values we obtained 
by varying the size m of the similarity windows and the size M of the search windows 
are reported in Tables 3 (a = 10), 4 (a = 20) and 5 (a = 30) for K = K . Note that 
the PSNR values are close for every m and M and the optimal m and M depend on the 
image content. The values m = 21 x 21 and M = 13 x 13 seem appropriate in most cases 
and a smaller patch size m can be considered for processing piecewise smooth images. 



5 Proofs of the main results 
5.1 Proof of Theorem 1 

We begin with some preliminary results. The following lemma can be obtained from 
Theorem 1 of Sacks and Ylvisaker [22]. For the convenience of readers, we prefer to give 
a direct proof adapted to our situation. 

Lemma 6 Let g p (w) be defined by (13). Then there are unique weights w p which minimize 
g p (w) subject to (7), given by 

Wp ( x ) = l(b-Xp(x)) + , (36) 
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Figure 2: Results of denoising "Lena" 512 x 512 image. Comparing (d) and (f) we see that the 
Optimal Weights Filter (OWF) captures more details than the Non-Local Means Filter (NLMF). 
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(a) Original image "Barbara" (b) Noisy image with a = 30, PSNR = 18.60(i6 




(e) Restored with NLMF, PSNR= 27.88dfe (f) Square error with NLMF 



Figure 3: Results of denoising "Barbara" 512 x 512 image. Comparing (d) and (f) we see 
that the Optimal Weights Filter (OWF) captures more details than the Non-Local Means Filter 
(NLMF). 15 



Figure 4: The noisy image with six selected search windows with centers at pixels a, b, c, d, e, 
f. 



Images 
Sizes 


Lena 
512 X 512 


Barbara 
512 X 512 


Boat 
512 x 512 


House 
256 x 256 


Peppers 
256 X 256 


a/PSNR 


10/28.12db 


10/28. 12db 


10/28.12db 


10/28. lldb 


10/28. lldb 


OWF with K r 
OWF with Kg 
OWF with K 
NLMF 


35.23db 
35.49db 
35.52db 
35.03db 


33.89db 
34.13db 
34.10db 
33.77db 


33.07db 
33.40db 
33.48db 
32.85db 


35.57db 
35.83db 
35.80db 
35.43db 


33.74db 
33.97db 
33.96db 
33.27db 


a/PSNR 


20/22.11db 


20/22. lldb 


20/22. lldb 


20/28. 12db 


20/28. 12db 


OWF with K r 

OWF With Kg 

OWF with K 
NLMF 


32.24db 
32.61db 
32.52db 
31.73db 


30.71db 
31.01db 
31.00db 
30.36db 


29.65db 
30.05db 
30.20db 
29.58db 


32.59db 
32.88db 
32.90db 
32.51db 


30.17db 
30.44db 
30.66db 
30. lldb 


a/PSNR 


30/18.60db 


30/18. 60db 


30/18.60db 


30/18. 61db 


30/18. 61db 


OWF with K r 

OWF With Kg 

OWF with K 
NLMF 


30.26db 
30.66db 
30.50db 
29.56db 


28.59db 
28.97db 
28.89db 
27.88db 


27.69db 
28.05db 
28.23db 
27.50db 


30.49db 
30.81db 
30.80db 
30.02db 


27.93db 
28.16db 
28.49db 
27.77db 



Table 2: Comparison between the Non-Local Means Filter (NLMF) and the Optimal Weights 
Filter (OWF). 
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Original image Noisy image 2D representation 3D representation Restored image 

of the weights of the weights 




Figure 5: These pictures show how the Optimal Weights Filter detects the features of the 
image by choosing appropriate weights. The first column displays six selected search windows 
used to estimate the image at the corresponding central pixels a, b, c, d, e and f. The second 
column displays the corresponding search windows corrupted by a Gaussian noise with standard 
deviation a = 20. The third column displays the two-dimensional representation of the weights 
used to estimate central pixels. The fourth column gives the three-dimensional representation 
of the weights. The fifth column gives the restored images. 
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a = 10 


Lena 


Barbara 


Boat 


House 


Peppers 


m/M 


512 x 512 


512 x 512 


512 x 512 


256 x 256 


256 x 256 


11 x 11/11 x 11 


35.35db 


34.03db 


33.43db 


35.69db 


34.16db 


13 x 13/11 x 11 


35.40db 


34.06db 


33.45db 


35.72db 


34.14db 


15 x 15/11 x 11 


35.44db 


34.07db 


33.47db 


35.73db 


34.10db 


17 x 17/11 x 11 


35.47db 


34.08db 


33.47db 


35.74db 


34.06db 


19 x 19/11 x 11 


35.50db 


34.07db 


33.48db 


35.74db 


34.02db 


21 x 21/11 x 11 


35.52db 


34.06db 


33.47db 


35.73db 


33.97db 


11 x 11/13 x 13 


35.35db 


34.08db 


33.43db 


35.77db 


34.15db 


13 x 13/13 x 13 


35.40db 


34.11db 


33.46db 


35.79db 


34.12db 


15 x 15/13 x 13 


35.44db 


34.12db 


33.47db 


35.80db 


34.09db 


17 x 17/13 x 13 


35.47db 


34.12db 


33.48db 


35.81db 


34.05db 


19 x 19/13 x 13 


35.50db 


34.12db 


33.48db 


35.81db 


34.01db 


21 x 21/13 x 13 


35.52db 


34.10db 


33.48db 


35.80db 


33.96db 


11 x 11/15 x 15 


35.33db 


34.11db 


33.43db 


35.82db 


34.14db 


13 x 13/15 x 15 


35.39db 


34.13db 


33.45db 


35.84db 


34.11db 


15 x 15/15 x 15 


35.43db 


34.14db 


33.47db 


35.85db 


34.08db 


17 x 17/15 x 15 


35.47db 


34.14db 


33.48db 


35.86db 


34.04db 


19 x 19/15 x 15 


35.49db 


34.14db 


33.48db 


35.85db 


34.00db 


21 x 21/15 x 15 


35.52db 


34.12db 


33.48db 


35.84db 


33.96db 


11 x 11/17 x 17 


35.32db 


34.13db 


33.42db 


35.86db 


34.12db 


13 x 13/17 x 17 


35.37db 


34.15db 


33.44db 


35.88db 


34.10db 


15 x 15/17 x 17 


35.42db 


34.16db 


33.46db 


35.89db 


34.07db 


17 x 17/17 x 17 


35.46db 


34.16db 


33.47db 


35.89db 


34.03db 


19 x 19/17 x 17 


35.48db 


34.15db 


33.47db 


35.88db 


34.00db 


21 x 21/17 x 17 


35.51db 


34.14db 


33.47db 


35.87db 


33.95db 



Table 3: PSNR values when Optimal Weights Filter with K = Kq is applied with different 
values of m and M (a = 10). 



cr = 20 


Lena 


Barbara 


Boat 


House 


Peppers 


m/M 


512 X 512 


512 X 512 


512 X 512 


256 x 256 


256 X 256 


11 x 11/11 x 11 


32.08db 


30.60db 


30.00db 


32.56db 


30.65db 


13 x 13/11 x 11 


32.20db 


30.70db 


30.06db 


32.64db 


30.68db 


15 x 15/11 x 11 


32.30db 


30.78db 


30.11db 


32.71db 


30.70db 


17 x 17/11 x 11 


32.39db 


30.84db 


30.15db 


32.76db 


30.70db 


19 x 19/11 x 11 


32.47db 


30.88db 


30.18db 


32.79db 


30.70db 


21 x 21/11 x 11 


32.53db 


30.91db 


30.21db 


32.81db 


30.69db 


11 x 11/13 x 13 


32.06db 


30.67db 


29.99db 


32.63db 


30.61db 


13 x 13/13 x 13 


32.18db 


30.78db 


30.05db 


32.71db 


30.64db 


15 x 15/13 x 13 


32.29db 


30.86db 


30.10db 


32.79db 


30.66db 


17 x 17/13 x 13 


32.38db 


30.92db 


30.14db 


32.84db 


30.67db 


19 x 19/13 x 13 


32.46db 


30.97db 


30.18db 


32.88db 


30.67db 


21 x 21/13 x 13 


32.52db 


31.00db 


30.20db 


32.90db 


30.66db 


11 x 11/15 x 15 


32.02db 


30.71db 


29.97db 


32.67db 


30.56db 


13 x 13/15 x 15 


32.15db 


30.82db 


30.03db 


32.76db 


30.59db 


15 x 15/15 x 15 


32.26db 


30.90db 


30.08db 


32.83db 


30.62db 


17 x 17/15 x 15 


32.35db 


30.96db 


30.12db 


32.89db 


30.63db 


19 x 19/15 x 15 


32.43db 


31.01db 


30.16db 


32.92db 


30.63db 


21 x 21/15 x 15 


32.50db 


31.04db 


30.19db 


32.94db 


30.63db 


11 x 11/17 x 17 


31.97db 


30.72db 


29.94db 


32.70db 


30.52db 


13 x 13/17 x 17 


32.10db 


30.83db 


30.00db 


32.79db 


30.56db 


15 x 15/17 x 17 


32.22db 


30.92db 


30.05db 


32.86db 


30.58db 


17 x 17/17 x 17 


32.32db 


30.98db 


30.10db 


32.92db 


30.59db 


19 x 19/17 x 17 


32.40db 


31.02db 


30.13db 


32.96db 


30.60db 


21 x 21/17 x 17 


32.47db 


31.06db 


30.17db 


32.98db 


30.60db 



Table 4: PSNR values when Optimal Weights Filter with K = Kq is applied with different 
values of m and M (a = 20) . 
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a = 30 


Lena 


Barbara 


Boat 


House 


Peppers 


m/M 


512 X 512 


512 X 512 


512 X 512 


256 x 256 


256 X 256 


11 x 11/11 x 11 


29.96db 


28.38db 


27.96db 


30.26db 


28.36db 


13 x 13/11 x 11 


30.10db 


28.53db 


28.03db 


30.39db 


28.43db 


15 x 15/11 x 11 


30.23db 


28.65db 


28.10db 


30.50db 


28.47db 


17 x 17/11 x 11 


30.34db 


28.75db 


28.15db 


30.58db 


28.50db 


19 x 19/11 x 11 


30.43db 


28.83db 


28.20db 


30.65db 


28.51db 


21 x 21/11 x 11 


30.50db 


28.81db 


28.23db 


30.70db 


28.52db 


11 x 11/13 x 13 


29.94db 


28.42db 


27.95db 


30.35db 


28.30db 


13 x 13/13 x 13 


30.08db 


28.58db 


28.02db 


30.49db 


28.37db 


15 x 15/13 x 13 


30.21db 


28.70db 


28.09db 


30.60db 


28.42db 


17 x 17/13 x 13 


30.32db 


28.80db 


28.14db 


30.68db 


28.46db 


19 x 19/13 x 13 


30.42db 


28.88db 


28.19db 


30.75db 


28.48db 


21 x 21/13 x 13 


30.50db 


28.89db 


28.23db 


30.80db 


28.49db 


11 x 11/15 x 15 


29.89db 


28.43db 


27.92db 


30.39db 


28.23db 


13 x 13/15 x 15 


30.04db 


28.58db 


27.99db 


30.53db 


28.30db 


15 x 15/15 x 15 


30.17db 


28.71db 


28.06db 


30.64db 


28.36db 


17 x 17/15 x 15 


30.28db 


28.81db 


28.11db 


30.73db 


28.40db 


19 x 19/15 x 15 


30.38db 


28.89db 


28.16db 


30.80db 


28.43db 


11 x 11/17 x 17 


29.82db 


28.40db 


27.89db 


30.39db 


28.18db 


13 x 13/17 x 17 


29.98db 


28.56db 


27.96db 


30.54db 


28.26db 


15 x 15/17 x 17 


30.11db 


28.69db 


28.02db 


30.66db 


28.31db 


17 x 17/17 x 17 


30.22db 


28.79db 


28.08db 


30.76db 


28.36db 


19 x 19/17 x 17 


30.33db 


28.87db 


28.13db 


30.84db 


28.39db 


21 x 21/17 x 17 


30.42db 


28.96db 


28.17db 


30.89db 


28.41db 



Table 5: PSNR values when Optimal Weights Filter with K = K$ is applied with different 
values of m and M (a = 30) . 

where b and A are determined by 

E ^-V(*)) + = 1, (37) 

E j- 2 (b-\ P (x))+p(x) = A. (38) 

Proof. Let w' be a minimizer of g p (w) under the constraint (7). According to Theorem 
3.9 of Whittle (1971 [24]), there are Lagrange multipliers b > and b (x) > 0, x E U Xihh , 
such that the function 

G(w) = g p (w)-2b( E w(x)-l)-2 E Ux)w{x) 

is minimized at the same point w'. Since the function G is strictly convex it admits a 
unique point of minimum. This implies that there is also a unique minimizer of g p (w) 
under the constraint (7) which coincides with the unique minimizer of G. 

Let w p be the unique minimizer of G satisfying the constraint (7). Again, using the 
fact that G is strictly convex, for any x G U Xo ,h, 



" G{w) 



dw (x) 



2 ( E w p(v)p(v) I P( x ) + ^ 2 w P {x) -2b- 2b (x) > 0. (39) 



Note that in general we do not have an equality in (39). In addition, by the Karush- 
Kuhn- Tucker condition, 

b (x)w p (x) = 0. (40) 
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Let 

Then (39) becomes 
d 



(41) 



dw (x) 



G(w) 



= Xp(x) + a 2 w p {x) - b- b (x) > 0, x e U XOth . (42) 



If bo(x) = 0, then, with respect to the single variable w(x) the function G(w) attains 
its minimum at an interior point w p (x) > 0, so that we have 







-G(w) 



dw (x) 

From this we obtain b — \p(x) = aw p (x) > 0, so 

(b-Xp(x)) 



= \p(x) + cr 2 w p (x) — 6 = 0. 



w p (x) 

u 

If bo(x) > 0, by (40), we have w p (x) = 0. Consequently, from (42) we have 

b - Xp(x) < -b (x) < 0, (43) 

so that we get again 



w p (x) = 



(b-xp(x)y 



As to the conditions (37) and (38), they follow immediately from the constraint (7) and 
the equation (41). ■ 

Proof of Theorem 1. Applying Lemma 6 with b = \a, we see that the unique op- 
timal weights w minimizing g p (w) subject to (7), are given by 



where a and A satisfy 



and 



Since the function 



w p = -^(a- p(x)) + , 
A (a-p(x))+ = a 2 

E (a- P (x)) + p(x) = a 2 . 



(44) 
(45) 

(46) 



XQ,h 



M P (t)= £ (t-p(x)) + p(x) 



is strictly increasing and continuous with M p (0) = and lim M p (t) = +oo, the equation 

t— >oo 

M p (a)=a 2 
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has a unique solution on (0, oo). By (45), 



^2 

T = ( a -p(^)) + > 



A 

which together with (44) imply (16) and (17). 

5.2 Proof of Remark 2 

Expression (14) can be rewritten as 

M 

M p (t) = j2^(t-Pi) + - (47) 
i=i 

Since function M p (t) is strictly increasing with M p (0) = and M p (+oo) = +oo, equation 
(17) admits a unique solution a on (0, +oo), which must be located in some interval 
[Pk , Pk +i), 1 < < ^f, where pm+i = oo (see Figure 6). Hence the equation (17) 
becomes 

Y jPl {a- Pl ) = a\ (48) 
i=i 

where p£ < a < Pfc +i. From (48), it follows that 

^ 2 + E>? 

« = — , Pk <a< Pfco+i- (49) 

Eft 

i=l 

We now show that ko = k* (so that a = k = k*), where k* := max{l < k < M \ > 
Pk}- To this end, it suffices to verify that a,k > Pk an d a k < Pk if < k < M. We have 
already seen that dk > Pk ] if k < k < M, then au < Pk () +i < Pk, so that 

ko k kg k kg k 

(° 2 + E pi) + E pi «fc E Pi + E pi pk E pi + E p^p* 

1=1 i=fco+l i=l fco + 1 i=l fco + l /rn\ 

Ofe = 1 = 1 < 1 = Pk- (50) 

ypi e pi e pi 

i=l i=l i=l 

We finally prove that if 1 < k < M and < p*,, then afc + i < p*;+i, so that the last 
equality in (19) holds and that k* is the unique integer k e {1, • • • , M} such that a& > p^ 
and Gtfc+i < pfc + i if 1 < k < M. In fact, for 1 < k < M, the inequality afc < p^ implies 
that 



i=i i=i 
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,0 a 

Pi P2 ■ ■ ■ Pk Q Pfco+1 ' " Pm 

Figure 6: The number axis of pi, i = 1, 2, ■ ■ ■ , M. 
This, in turn, implies that 

k k 

° 2 + E p! + pI+i pk E Pi + Pfe+i 
= ^ < ^ Pfc+i- 

E pi E ^ 

i=i i=i 



5.3 Proof of Theorem 3 

First assume that p{x) = Pf, xo ( x ) — \f( x ) ~ f( x o)\- Recall that g p and w p were defined 
by (13) and (16). Using Holder's condition (25) we have, for any w, 

9p{w p ) < g p {w) < g(w), 

where 

d( w ) =1 W ( X ) L \\ X ~ x oWL \ +°~ 2 w2 ( x )- 

In particular, denoting w = argmin^^(w), we get 

9p{w P ) < g(w). 

By Theorem 1, 

w(x) = {a - L\\x - I ( a ~ L \\ x - x oWL) + i 

where a > is the unique solution on (0, oo) of the equation M h (a) = a 2 , with 
M h (t) = L W X ~ x ofoo(t ~ L\\x - xoW^ , t>0. 

Theorem 3 will be a consequence of the following lemma. 

Lemma 7 Assume that p{x) = L\\x — XqW 1 ^ and that h > C\n~ a with < a < 1^2, or 
h = Cl n"^ with Cl > co = ( *W+W+*) yn*. Then 

/3/(2/3 + 2) (1+o(1)) (51) 

and 

2)3 

g(w) < c 4 n 2 + 2 /3 , (52) 
where c 3 and C4 are positive constants depending only on f3, L and a. 
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Proof. We first prove (51) in the case where h — 1, i.e. \J Xo ,h = I- Then by the definition 
of a, we have 

Mi (a) = ^(a - L\\x - rr ||^) + L||:r - x ||^ = o 2 . (53) 
xei 

Let h = (a/L) 1 ^. Then a — L\\x — XqW^ > if and only if ||x — £o||oo < ^- So from (53) 
we get 

L 2 h P Yl \\x-xo\\i-L 2 II* - solE = (J 2 . (54) 

||ic — xo||oo</i ||x— xo||oo<7i 

By the definition of the neighborhood U it is easily seen that 

Nh 7-/3+2 

£ \\x - Xo\L = Y = ZN 2 ^ (1 + o (1)) 

||x— Xo\\oo<h k=l 



and 



Nh 77 2/3+2 
J2 \\x - xolffi = 8iV^ £ A; 2/3+1 = 8^ 2 ^ (1 + o (1)) . 

||x — Xo||oo<^ fc = 1 



Therefore, (54) implies 

8L2/? -iV 2 ^ +2 (l + o(l))= ( x 2 ) 



(/3 + 2) (2/3 + 2)' 
from which we infer that 

h = Q)ti^(l + o(l)) (55) 
with c = ( g2( ^8iy +2) ) ^ • From ( 55 ) and the definition of fc, we obtain 

a = Llf = Lcg/T^l + o(l)), 

which prove (51) in the case when h = 1. 

We next prove (51) under the conditions of the lemma. If h > c n _a , where < a < 
2^2, then it is clear that h > h for n sufficiently large. Therefore M h (a) = M 1 (a), thus 

we arrive at equation (53) from which we deduce (55). If h > c n~ 2 P+ 2 and Co > c± : then 
again h > h for n sufficiently large. Therefore M h (a) = Mi (a), and we arrive again at 
(55). 

We finally prove (52). Denote for brevity 

G h = £ (h"-\\x-xo\0 + . 

Since h > h for n sufficiently large, we have M h (a) = (a) = a 2 and Gh = G^. Then 
it is easy to see that 

m = - 2 '- 

a 2 a 



lg t ; 
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Since 



G h = (h? ~ \\ x ~ x o\\L) 

E 8*-^ E « M 

l<k<Nh l<k<Nh 

4/3 N^ + \l + o{l)) 



(3 + 2 
4(3 



{(3 + 2) LW 



N 2 a {P+2)/P (1 + (1)), 



we obtain 



<? (-) = - A1 ^L^-^ (1 + o (1)) = c 5 n"^ (1 + o (1)) , 
where c 4 is a constant depending on /3, L and er. ■ 
Proof of Theorem 3. As p(x) — \f (x) — / (x ) \ + S n , we have 

J2 w(x)p(x)\ < ^)|/(x)-/(x )|+^ 

2 



<2 i w(x)\f(x)-f(x )\ \ +251 

Hence 

g p {w) < 2g(w) + 25 2 n . 

So 

9 P (w„) < g p {w) < 2g(w) + 25 2 n . 
Therefore, by Lemma 7 and the condition that 5 n = O (n~ 2 f>+ 2 we obtain 

9p{w p ) = O (rCW^i . 

This gives (26). 

5.4 Proof of Theorem 4 

We begin with a decomposition of j5^ (x). Note that 
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Recall that M' = card U' Xo>h = nh 2 /2, m" = card V£ 0j)J = nr] 2 /2. Let T XOjX be the 
translation mapping T XOjX y = x + (y - x ). Denote A X0)X (y) = /(y) - f(T XQ;X y) and 

C(y) = z(y)- £ ( T x ,xy)- Since 

y(y)-y(^y) = ^ (y) + C(y), 

it is easy to see that 

lit 



veV" 



where 



= A 'o»> (57) 

t/fV ;/ 

= -251 (x) + S 2 (x) (58) 



with 



Si(x) = E A *o,* (y) C (y) , 



w = i E (ca/) 2 -2- 2 ). 



Notice that ES^x) = ES 2 (x) = ES (x) = 0. Then obviously 



d (Y>> !r) ,Y>>J - aV2 = ^A 2 (x) + S(x)+2a 2 -V2^ 2 

A 2 {x) + S{x) 

^A 2 {x) + S(x)+ 2a 2 + V2^' 

First we prove the following lemma. 



(59) 



Lemma 8 Suppose that the function f satisfies the local Holder condition (25). Then, 
for any x E U' xo h , 

\p\ Xo (x) - 2L 2 ^ < A 2 (x) < 3p 2 Lxo (x) + QL 2 r] 213 . 

Proof. By the decomposition 

/ (y) - / (T X0 , X (y)) = [f (x ) - f (x)} + [/ (y) - f (x )} + [/ (x) - f (T XQ:X („))] 
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*2 



A 2 (x) 



■ < Q (7,2 , h 2 , 3\ 


TO7"P ohf fllTl 


i E (/(y)- 

to * — ' 








A E (/(*«)- 


















/(^xo,x (^/))) 2 



By the local Holder condition (25) this implies 

A 2 (x) < 3 (/ (x ) - f (:r)) 2 + 3L 2 ^ + 3L 2 V 2 ?, 

which gives the upper bound. The lower bound can be proved similarly using the inequal- 
ity (a + b + cf > \a 2 - b 2 - c 2 . m 

We first prove a large deviation inequality for S(x). 

Lemma 9 Let S(x) be defined by (58). Then there are two constants C\ and c 2 such that 
for any < z < C\ (m") 1 ^ 2 , 

P (\S(x)\ > -L= j < 2exp (-c 2 z 2 ) . 
V vm" / 

Proof. Denote £(y) = { {yf - 2a 2 - 2A XQ:X (y) ( (y) . Since C (y) = e(y) - e(T XQjX y) is a 
normal random variable with mean and variance 2<r 2 , the random variable £ (y) has an 
exponential moment, i.e. there exist two positive constants to an d C3 depending only on 
/3, L and a 2 such that 4> y (t) = Ee*^ < c 3 , for any \t\ < t . Let ip y (t) = ln0 y (t) be the 
cumulate generating function. By Chebyshev's exponential inequality we get, 

F{S(x) > zVm"} < exp \ -tzVm" + Yl f ' 

{ vev" xo , v J 

for any \t\ < to an d for any z > 0. By the-three terms Taylor expansion, for \t\ < to, 

^(t) = ^(o)+^;(o) + ^;w, 

where \0\ < 1, ^„(0) = 0, ^(0) = Ef (y) = and 

" wU (0,W) 2 
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Since, by Jensen's inequality Ee*^ 3 ^ > = 1, we obtain the following upper bound: 

<(0<«(0 = Ee 2 (y)e <€(tf) . 

Using the elementary inequality x 2 e x < e 3x , x > 0, we have, for \t\ < t /3, 

m < |E (f ^ < ^Ee**® < |c 3 . 
This implies that for \t\ < t , 

< 1>y(t) < 

and 

P (s(x) > zV^*) < exp{-tzVm^ + ^|m" 2 }. 
If t = Ciz/y/m" < to/3, where c 4 is a positive constant, we obtain 

P > zv 7 ^ 7 ) < exp |-c 4 ^ 2 ^1 - ^|c 4 ^ | . 

Choosing c 4 > sufficiently small we get 

P (S(x) > zVm"^) < exp (-c 5 z 2 ) 

for some constant c 5 > 0. In the same way we show that 

P (S(x) < -zVrr?^ < exp (-c 5 z 2 ) . 

This proves the lemma. ■ 

We next prove that p" (x) is uniformly of order O {n~ 2 ^+ 2 Vhi nj with probability 

1-0 {n~ 2 ) , if h has the order n" 1 ^. 

Lemma 10 Suppose that the function f satisfies the local Holder condition (25). Assume 

that h = dn~^ with c 1 > c = (^^frf^) ^ and that r] = c 2 n~ 2 ~F+ 2 ~. Then there 
exists a constant C3 > depending only on f3, L and a, such that 

P i max p^ o (x) > c 3 n~vb Vh^l \ = O (n~ 2 ) . (60) 

Proof. Using Lemma 9, there are two constants c 4 , c 5 such that, for any z satisfying 

< z < c 4 (m") 1/2 , 

PI max \S(x)\>-Z=) < V Pf|5(x)|> " 



< 2m" exp (— c 5 z 2 ) 
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Recall that m" = nrj / 2 = cjn 2 ^ 2 . Leting z = v c 6 log m" and choosing C6 sufficiently 
large we obtain 

P| max \S(x)\ > c 8 n-wriVh^l) < (61) 
\**K ,h - J ™ 2 

Using Lemma 8 and the local Holder condition (25) we have A 2 (a;) < cL 2 h 213 , for x G \J' XQ h . 
From (56) and (59), with probability 1 — (n~ 2 ) , we have 



^ 1 w A 2 W + \S(x) 

max p (x) < max 



*zK ,h " ^u; k v/A 2 (x) + S{x)+ 2a 2 + 
cL 2 h 213 + Cg n~2^2 Vlnn 



< 



Since h = O (n 2 < i + 2 ) , this gives the desired result. ■ 

We then prove that given {Y(x),x G I" }, the conditional expectation of \fh„(xo) — 
/(^o)| is of order O ^n~ 2 ' 3 + 2 Vhi nj with probability 1 — (n~ 2 ) . 

Lemma 11 Suppose that the conditions of Theorem 4 (ire satisfied. Then 

P (E{\f^(x Q ) - f(x )\ 2 | Y(x), x G O > crT^ Inn) = 0(n~ 2 ), 

where c > is a constant depending only on (3, L and a. 

Proof. By (29) and the independence of e(x), we have 

H\ti, v (xo)-f(xo)\ 2 I Y(x),xei: } 



Since pf iXQ (x) < Lh 13 , from (62) we get 



(62) 



E{\f^(x )-f(x )\ 2 | Y(x),xeK } 
<L 2 ^ + ( r 2 ^ w"\x) 



< E «w + ^ E ^)+^ ( 63 ) 



XQ,h / XQ,h 
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Let w\ = axgmmgi(w), where 



9i 



As w" minimizes the function in (30), from (63) we obtain 
E{|/^>o)-/K)| 2 \Y(x),xeI'> } 

< ( E +° 2 E <(*)+^ 2/3 - 



(65) 



By Lemma 10, with probability 1 — 0(n 2 ) we have 



E ^iW^o^) - Cl " 2f9+2 v / hi 



71. 



XQ,n 



Therefore by (65), with probability 1 — 0(n 2 ), 

n\fk v (^)-f(x )\ 2 1 7(4^1;} 

- ° 2 E K 2 ^) + c 2 n _5 ^ In n + L 2 /i 2/3 

X6U' . 



2/3 

< ^(w*) + c 2 n"^+2 Inn + L 2 h w . 

This gives the assertion of Lemma 13, as h 2/3 = O {rT 2 i , + 2 j and <7i(tu*) = O (n~ 2 f>+ 2 j , 

by Lemma 7 with U^. o h instead of U^^. ■ 

Now we are ready to prove Theorem 4. 
Proof of Theorem 4- Since the function / satisfies Holder's condition, by the definition of 
gi(w) (cf. (64)) we have 



gi(w)< e ^ + ff2 E w2 (*) 

< L 2 /^ + ( x 2 < L 2 + a 2 , 



so that 



e(i^>o)-/k)i 2 i n4^i:)<#")<i 2 +^ 2 



Denote by X the conditional expectation in the above display and write !{■} for the 
indicator function of the set {■}. Then 

2/3 2,3 

EX = EX ■ 1{X > cn w Inn} + EX • 1{X < cn'w hm} 

2/3 2/3 

< (L 2 + a 2 ) P{X > ck^w Inn} + cn" 2 ^ 2 hm. 
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So applying Lemma 11, we see that 

e(|/U^)-/(^o)| 2 )=ex 



< 0(n ) + cn w+ 2 Inn 
= (n~2%+2 Inn ) . 



This proves Theorem 4. 



5.5 Proof of Theorem 5 



We keep the notations of the prevoius subsection. The following result gives a two sided 
bound for j5£ (x). 

Lemma 12 Suppose that the function f satisfies the local Holder condition (25). Assume 

- i i 

that h = c\n a with c\ > and a < on ^ ^ ia ^ W = ° 2n 2/3+2 ■ Then there exists positive 

constants c 3 , c 4 , c 5 and c 6 depending only on (3, L and a, such that 

P | max (fix (x) ~ c 3Pf, X0 ( x )) < c 4 n~^Vhni i = 1-0 (n~ 2 ) (66) 

(67) 



and 



C8 

n 2 



P { max (pl xo (x) - c 5 % Q (x)) < c 6 n"^ vln^l = 1-0 (n~ 2 ) . 
Proof. As in the proof of Lemma 10, we have 

Pi max \S(x)\ > c 7 n'^V\nn | < 
Using Lemma 8, for any x G h , 

\pl X0 (x) - < A 2 (x) < 3p% (x) + QLW". (68) 

From (56) we have 



^/A 2 {x) + S{x) + la 2 + V2o^ 
For the upper bound we have, for any x G h , 

/ / „ x r -x+ 3p 2 (x) + 6L 2 rj 2 P + \S(x) 

(*) = {d (y;„ y^j - ^) < ^ — ^ 

Therefore, with probability 1 — (n~ 2 ) , 

max fe (,)-^Ml < ^ + c 7 n"^v^ 



< c 8 n~2^F2A/lnn. 
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For the lower bound, we have, for any x G U' h , we have 



%W = (d (Y>> !V ,Y>>J - aV2 



(A 2 (x) + S(x)) 



+ 



> 



^A 2 (x) + S(x] 


\ + 2a 2 + V2a 2 


(A 2 i 


[x) + S(x)) + 


J A 2 (x) + c 7 n" 


wVhin + 2o- 2 + \/2o 2 



> c 9 (A 2 (x) + S(x)) + 

> c 9 (A 2 (x)-\S(x)\). 

Taking into account (68), on the set {max^u' \S( X )\ < c 7 n~ 2 < 9 + 2 Vhinl , 

Ux) > c 9 (^pl X0 (x)-2L 2 ^-\S(x)\^ 
> Cio (p/ j;ro (x) - r] w - n~^\/\an s j 
Therefore, with probability 1 — (n~ 2 ) , 

max (ci ^ (a;)-^ (x)) < c 10 fr] 2 ^ + n"^Vh^l) 



< cnn 2 ' 3 + 2 A/hi 



n. 



So the lemma is proved. ■ 

We then prove that given {Y(x),x e I" }, the conditional expectation of |/^(x ) — 

/(a?o)| is of order O |n"^ n j with probability 1-0 (n~ 2 ). 
Lemma 13 Suppose that the conditions of Theorem 5 are satisfied. Then 

p(e{|/^(x )-/(x )| 2 | Y(x),xei: }>cn-^\nn) =0(n- 2 ), 
where c> is a constant depending only on (3, L and a. 
Proof. By (29) and the independence of e(x), we have 

E{i/Ux )-/Ni 2 |n4^i:„}< ( E E ™" 2 w- 

Since, by Lemma 12, with probability 1 — (n~ 2 ) , 

max (p 2 f>Xo (x) - ci^ (x)) < c^n" 2 ^ v^, 
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we get (with probability 1 — O (n 2 )), 

E{|/^o)-/(^o)| 2 | Y(x),xeK Q } 



< c 3 



j ™"( X )\/Px ( x ) +c 2 n~^Vh^l + a 2 Y w" 2 {x). (69) 



A simple truncation argument, using the decomposition 



fix ( X ) 



gives 



^ w"(x)yj^jx) < n 22 /+ 2 ^2 w" (x) + n,2 ^ w"(x)^ 

< n -hwv2 +n iwv2 ^ w"(x)f?x (x). (70) 



From (69) and (70) one gets 

E{|/^>o)-/(xo)| 2 |y(x),xel^} 

< c 4 n^ «5%c)p£ (a;) ] +o- 2 Y w" 2 (x) \+c 5 n~^\Ann. 

\\ X ^,H / X ^> x0th J 

Let w\ = argmin gi(w), where g\ was defined in (65). As w" minimize the function in 

w 

(30), from (63) we obtain 

E{\f^(x ) - f(x )\ 2 | Y(x),xeK } 



< c 4 n 2 P+ 2 j j ^2 w i( x )fix ( x )\ + " 2 ^2 +c 5 n 2 /+ 2 v / rnn. 



(71) 



By Lemma 12, with probability 1 — O (n 2 ) , 

max (pf^(x) - c 6 p 2 f xo (x)) < c 7 n^Vhn. 

Therefore, with probability 1 — 0(n~ 2 ), 

E{|/^o)-/(*o)| 2 | Y(x),xel'> } 

< c 8 n^ J2 w *i( x )Pf,xo( x )\ +° 2 J2 <\ x )\ +c 9 n-^V^ 

fi p . 

= c 8 n 2 ' 3 + 2 gi{w{) +c 9 n 2 /5+ 2 Vlnn. 



n 
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This gives the assertion of Lemma 13, as gi(wl) — O (n 2 "+ 2 j by Lemma 7 with U' XQth 
instead of U xo ,h- ■ 

Proof of Theorem 5. Since the function / satisfies Holder's condition, by the definition 
of gi(w) (cf. (64)) we have (see the proof of Theorem 4) 

gi M <L 2 + a 2 

so that 

E (|/U*°) - /(^o)l 2 | E K Q ) < 9l (w") < L 2 + a 2 . 

Denote by X the conditional expectation in the above display. Then 

EX = EX • 1{X > cn'vh Inn} + EX • 1{X < cn~^ Inn} 
< (L 2 + a 2 ) P{X > cn"W2 Inn} + cn" 2 ^ 5 l nn . 

So applying Lemma 13, we see that 

E(\f^(x )-f(x )\ 2 ) =EX 

< 0(n ) + cn 2/3+2 inn 
= O (n~2^T2 innj . 

This proves Theorem 5 



6 Conclusion 

A new image denoising filter to deal with the additive Gaussian white noise model based 
on a weights optimization problem is proposed. The proposed algorithm is computa- 
tionally fast and its implementation is straightforward. Our work leads to the following 
conclusions. 



1. In the non-local means filter the choice of the Gaussian kernel is not justified. Our 
approach shows that it is preferable to choose the triangular kernel. 

2. The obtained estimator is shown to converge at the usual optimal rate, under some 
regularity conditions on the target function. To the best of our knowledge such 
convergence results have not been established so far. 

3. Our filter is parameter free in the sense that it chooses automatically the bandwidth 
parameter. 

4. Our numerical results confirm that optimal choice of the kernel improves the per- 
formance of the non-local means filter, under the same conditions. 
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